Disclosure Control by Computer Scientists: An Overview and an Application of Microaggregation to Mobility Data Anonymization
نویسندگان
چکیده
Privacy-preserving data mining (PPDM) is a subdiscipline of computer science which in many respects is parallel to statistical disclosure control (SDC) within statistics. See [12] for a survey of recent developments in PPDM. We focus here on the connections between k-anonymity, a concept arisen in the PPDM community, and microaggregation, a family of methods developed within SDC. This is discussed at a conceptual level in Section 2. We then move to anonymization of mobility data, i.e. trajectories, a very dynamic area in PPDM and a completely neglected one in SDC. In Section 3 we apply the microaggregation approach to k-anonymize real-world trajectories. We present a new distance measure for spatio-temporal data that facilitates the microaggregation process. The measure naturally considers both spatial and temporal aspects and can be fine-tuned for specific applications and instantiated with existing measures for spatial data, sequences, or time series. Conclusions are summarized in Section 4.
منابع مشابه
Utility preserving query log anonymization via semantic microaggregation
Query logs are of great interest for scientists and companies for research, statistical and commercial purposes. However, the availability of query logs for secondary uses raises privacy issues since they allow the identification and/or revelation of sensitive information about individual users. Hence, query anonymization is crucial to avoid identity disclosure. To enable the publication of pri...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملAn Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling
In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...
متن کاملSemantic microaggregation for the anonymization of query logs using the open directory project
Web search engines gather information from the queries performed by the user in the form of query logs. These logs are extremely useful for research, marketing, or profiling, but at the same time they are a great threat to the user’s privacy. We provide a novel approach to anonymize query logs so they ensure user k-anonymity, by extending a common method used in statistical disclosure control: ...
متن کاملImproved Univariate Microaggregation for Integer Values
Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational priva...
متن کامل